Okay, welcome everyone.
Let's start as usual with a small quiz.
Okay, some more taking pictures.
So the first question, I mean you can already start, it should be available.
So the first question is about speech features.
So we talked a little bit about speech production and how to produce speech features.
We learned that there are different categories, so more looking on these short spectral features
and this is where MFCCs come into place.
And so the first question of this quiz is to order the algorithm in the correct way.
So some more people arriving, that's great.
Okay.
Good, so we have some votes I would say in interest of time.
We will see what you got.
And I think I can also make that bigger here.
But then you don't see, okay, I guess it's enough in that way.
So some people didn't get it right, but that is not so, so it's a difficult question.
I mean you have to order quite some parts here.
So let's go maybe back to the original order here.
So first, in the first stage you have to take, you have your speech signal and you have to
take some overlapping windows.
And typical window sizes are between 10 and 30 milliseconds, so 20 or 25.
It's a quite usual window of these MFCCs.
So afterwards you apply a hemming window, so this is stage B, because you want to apply,
because of the Fourier transform, it's only applicable if you have periodic spectrum,
so periodic samples.
And in this way you can at least mitigate this problem.
And afterwards you apply the power spectrum or Zepstrom, so this is a little bit dependent
on the algorithm you're using specifically.
I think the original version, it used just the power spectrum and more modern versions
use the Zepstrom.
Here you apply then directly the DFT on these short windows.
Afterwards you integrate over so-called mal-scaled bands, and this is just you have now your
frequencies and you basically average different bands together in the certain, so as these
mal-scaled filter bands are denoted.
So some will go only, I don't know, from 8,000 to 10,000 hertz and over those you average
them or well you filter with the coefficients of this mal-filter bank.
And afterwards you apply a DCT.
This is for de-correlating the signal even further and to get rid of these, I mean you
still have overlapping in the frequencies, also these mal-frequency filter bank, you
overlaps and to take a relate this further you apply this DCT and this is
basically then the last step and you get out the MFCCs. Then typically what you do
is also that you apply first and second order derivatives and put
them also as additional features because they typically depend on previous
signals. Yeah, this was the first question, a little bit about MFCCs.
Let's continue to the second one and I think to click here. So it's a rather
straightforward question I would say. So we learned several image-based features
and some of them are based on gradient orientations and maybe you remember
especially one hazard even in his name. This is why I abbreviated the
algorithms here but maybe you remember what that was.
Okay, seems to be not so difficult for you. Already 11 people voted. Maybe in
Presenters
Zugänglich über
Offener Zugang
Dauer
01:30:45 Min
Aufnahmedatum
2022-07-01
Hochgeladen am
2022-07-01 18:19:05
Sprache
en-US